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ABSTRACT 

Recent studies have found substantial reductions in 
gender differences in the prediction of academic achievement in 
colleges when variations in grading standards amon fe courses were 
taken into account. This project examined gender differences in the 
prediction of freshman grades after controlling for differential 
course grading based on college majors. This method involved deriving 
variables that measured grading leniency using residual scores from 
the within-gender regressions of freshman grades on high school 
grades and scores on the Scholastic Aptitude Test for the non-Latino 
white sroup. The procedure worked quite well and generalized to other 
groups^not involved in the derivation of the grading leniency scale. 
Nevertheless, there were modest, sometimes statistically significant-, 
gender differences in prediction that remained after this control 
variable was introduced into the regressions. The largest and 
smallest differences for females between actual grades and grades 
predicted from the males' regressions tended to be found in African 
American and Asian American groups respectively. The results imply 
that the use of information on college majors is a reasonable, 
practical procedure for controlling grading leniency. Thirteen tables 
present analysis results. (Contains 32 references.) (Author/SLD) 
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Abstract 

Recent studies have found substantial reductions in 
gender differences in the prediction of academic 
achievement in college when variations in grading stan- 
dards among courses were taken into account. The pur- 
pose of this project was to examine gender differences 
in the prediction of freshman grades after controlling 
for differential course grading based on college majors. 
This method involved deriving a variable that measured 
grading leniency using residual scores from the within- 
gender regressions of freshman grades on high school 
grades and scores on the SAT for the non-Latino white 
group. The procedure worked quite well and general- 
ized to other groups not involved in the derivation of 
the grading-leniency scale. Nevertheless, there were 
modest, sometimes statistically significant, gc-- ' dif- 
ferences in prediction that remained after this control 
variable was introduced into rhe regressions. The largest 
and smallest differences for females between actual 
grades and grades predicted from the males' regressions 
tended to be found in the African American and Asian 
American groups, respectively. The results imply that 
the use of information on college majors is a reasonable, 
practical procedure for controlling for grading leniency. 



Introduction 

The objective of this study was to explore sources of 
possible gender differences in the prediction of college 
grades at four universities. The analyses focused on sep- 
arate contributions to these differences by individual 
predictors: high school grades and SAT scores, both 
verbal (SAT-V) and mathematical (SAT-M). Of special 
interest was the extent to which gender differences in 
predicted versus actual grades persisted after controlling 
for differential grading standards in the college courses 
taken by students majoring in different fields of study. 
All analyses were done separately by racial/ethnic 
groups within each university to examine variations in 
the size of gender differences across groups varying in 
cultural and language background. 

Background 

A large body of research on a variety of admission tests 
has shown that the prediction of grades in high school 
and higher education differs for males and females. 
Typically, males achieve lower grades than females in 
high school, college, and law school despite having 



higher test scores (see reviews by Clark and Grandy 
1984, College Board 1988, Linn 1982, and Wilder and 
Powell 1989; and large, more recent studies by Ramist, 
Lewis, and McCamley 1994, and Sawyer 1986). In ad- 
dition, the degree of relationship between predicted and 
actual grades and the correlations between academic- 
performance in higher education and admission test 
scores are often stronger for females than for males 
(Linn 1982; Morgan 1990; Ramist et al. 1994; Sawyer 
1986). 

Several explanations have been proposed for these 
findings, including (1) disproportionate enrollment of 
males in college courses with harsher grading standards, 
such as the physical sciences; (2) lower percentages of 
females than of males taking high school science and 
mathematics courses, thus raising high school grades for 
females; (3) superior study habits and self-discipline 
among females; (4) superior writing skills among fe- 
males; and (5) bias in the tests. Since the major focus 
here is on grading standards, a full discussion of the 
other issues (three, four, and five) is beyond the scope of 
this paper. It can be said briefly that there is evidence 
partly supporting each of the explanations (Breland and 
Gr.swold 1982; Bridgeman 1989; Bridgeman and 
Wendler 1989, 1991; Bridgeman and Lewis 1991; 
College Board 1988; Ekstrom, Goertz, and Rock 1988; 
Ell iott and Strenta 1988; iMazzeo, Schmitt, and 
Bleistein 1989; McCormack and iMcLeod 1988; Mullis 
and Jenkins 1988; Ramist et al. 1994; Strieker, Rock, 
and Burton 1991; Wilder and Powell 1989; Young 
1991). 

iMany studies have established that grading stan- 
dards vary by field of study and that females tend to 
gravitate toward the more leniently graded courses and 
college majors. Typically, more males than females arc 
interested in majoring in engineering and the physical 
sciences (Grandy 1987a, 1987b), whereas females are 
more often interested in the humanities and certain so- 
cial sciences. The distribution of grades in engineering 
and the physical sciences tends toward higher frequen- 
cies in the range of C or below more frequently than the 
distribution of grades in the humanities and certain so- 
cial sciences. These differences are found even when 
previous academic achievement and test scores are 
taken into account (Elliott and Strenta 1988; Goldman 
and Hewitt 197.5; Goldman et al. 1974; Goldman and 
Widawski 1976; Strenta and Elliott 1987; Willingham 
1985). 

A number of studies have found that gender differ- 
ences in the prediction of college grades (usually in the 
direction of underprediction of females' grades) are re- 
duced, eliminated, or occasionally reversed when 
grading leniency was controlled, cither by predicting in- 
dividual course grades (McCormack and Mcl.eod 1988; 
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Ramist ct al. 1994), or by adjusting the cumulative 
grade-point average (Elliott and Strenta 1988; Strieker 
ct al. 1991; Young 1991). Nevertheless, some gender 
differences in the prediction of college grades remained 
statistically significant even after controlling for grading 
standards (Strieker et al. 1991), but the differences 
tended to be smaller or nonexistent at more selective 
colleges with students who had high average composites 
of high school grades and SAT scores (Ramist et al. 
1994). Some authors have argued that test bias (among 
other explanations) cannot be ruled out (e.g., Elliott and 
Strenta 1988) because gender differences have persisted 
in the prediction of individual course grades in psy- 
chology (Elliott and Strenta 1988), mathematics 
(Bridgcman and Wendler 1989, 1991) and a variety of 
other subject areas (Ramist et al. 1994). 

Either method, the prediction of individual course 
grades or the adjustment of the cumulative grade-point 
average, used in the aforementioned studies is labor 
intensive and impractical for routine application in 
many settings. Both methods depend on the analysis of 
individual course grades at the undergraduate level, 
which is fraught with practical difficulties. Not every 
student takes every course, so for the majority of 
courses, samples of students enrolled in a particular 
course are unrepresentative and small. These factors 
introduce statistical complexities in the analysis of tran- 
script data that are not easily handled by routine proce- 
dures. Furthermore, transcript data are not always 
available in a form readily usable for computer 
analyses. 

As a more practical option, some researchers have 
categorized grade-point average by schools within a 
university (Gamache and Novick 1985) or used college 
major data to control for leniency in course grading in 
the study of gender-differentiated prediction (Pennock- 
Roman 1990). Unlike transcript data, information on 
within-university subdivisions or college majors is more 
accessible. Frequently, university records contain stu- 
dents' intended college majors, or alternatively, the ma- 
lority of students taking the SAT indicate their intended 
field of study on the Student Descriptive Questionnaire 
(SDQ). While the curriculum is less specialized in the 
freshman year than in the later years of college, differ- 
entiation may occur even in the first year. For example, 
the introductory physics course taken by physics 
majors may be faster paced and more mathematical 
than the introductory physics course taken by non- 
science majors. 

Although little is known about the effectiveness of 
controlling for grading standards by analyzing grades in 
terms of subdivisions within institutions, the approach 
using college majors has shown promising results. 



Gamache and Novick (1985) did not analyze gender 
effects on overall grades pooling across college subdivi- 
sions; therefore, we cannot tell whether the gender dif- 
ferences they found within each subdivision would have 
been the same or smaller than in analyses using overall 
grades pooling all subdivisions. However, using dummy 
variables to categorize college majors improved the 
prediction of college grades in studies by Goldman 
and Hewitt (1976) and Pennock-Roman (1990). In 
these studies, the effects of ethnicity or gender in pre- 
diction were reduced but not completely eliminated. 
Pennock-Roman (1990) demonstrated no statistically 
significant gender effects on freshman college grades 
at five of six large and prominent universities after 
controlling for college major in Latino American and 
non-Latino white groups. It is possible that gender dif- 
ferences might have been completely eliminated if a 
finer grouping of majors had been achieved and if the 
classification of majors had been specifically tailored to 
each institution. 1 

Rationale 

In the present investigation, analyses of Pennock- 
Roman's (1990) data were extended in several ways. 
First, differences in regressions were examined when 
each of the predictors (high school grades, SAT-V, and 
SAT-M) were considered jointly versus singly. These 
analyses evaluated how much each predictor con- 
tributed to gender differences in the prediction of 
grades. For example, the question of possible grade in- 
flation in high school grades for females was addressed 
by evaluating whether freshman grades were lower than 
expected given the high school record. Second, slope 
coefficient differences were examined, whereas the pre- 
vious study considered only possible intercept differ- 
ences. Third, the analyses included Asian American and 
African American samples for whom data were col- 
lected and merged but not analyzed in the previous 
study, which had focused on Latino American and non- 
Latino white students. Findings on gender differences 
within Asian American and African American groups 
are less often available than for the non-Latino white 
group. In particular, it was expected that the Asian 
American group would show a more gender- 
balanced choice of majors in the physical sciences 
and that gender differences in prediction before ad- 



'ln i hat study, no further evaluation of the influence of college major on gentler 
differences in prediction was done because the main focus was on the effects of 
language background on the prediction of college grades. Ciender was only one 
of several control variables. 



justing for college majors might lie smaller than in other 
groups. 

Finally, the categorization of major fields by insti- 
tution was improved by grouping majors according to 
empirically derived measures of grading leniency rather 
than by similarity of subject matter. In the prior 
analyses, only a rather crude, four-category classifica- 
tion of college majors was used. The categories were: 
(1) physical scierces and engineering, (2) biological and 
health sciences, i3) humanities, prelaw, and social sci- 
ences, and (4) business, education, communication, and 
home economics. There was considerable heterogeneity 
within these categories (e.g., premedicine, biology, and 
nursing were grouped together). It is possible that this 
classification was not a good control for grading le- 
niency at the public university in Texas, the only insti- 
tution at which a significant gender effect was found. 
The categorization derived here was expected to control 
more effectively for grading leniency, thus reducing the 
gender effect, if this effect was truly an artifact. Gender 
effects on the prediction of fresh: .an grades were ex- 
amined before and after taking into account the reclas- 
sification of college majors. 



Method 

Data Source 

Four institutions from the Pennock-Roman (1990) data 
set were included: a public university in Texas, a private 
university in Massachusetts (two freshman classes), and 
two universities in California, one public and the other 
private. The original set of institutions included two ad- 
ditional universities that will noi be considered here be- 
cause the relationship between preadmission measures 
and college grades was atypically low at those institu- 
tions. The sample sizes here are smaller than in the pre- 
vious study for two reasons. One, students lacking any 
of the predictors — high school grade-point average 
(HSGPA), SAT-V, or SAT-M — were excluded from the 
analyses. Second, students reporting that English was 
not their best language were also excluded from the 
analyses. Ramist et al. (1994) found that the college 
grades of non-native speakers of English tended to be 
underpredicted by test scores. It was desirable to focus 
here on gender differences among students for whom 
English is their best language, thus avoiding the addi- 
tional variability introduced by language background. 
Information about college majors was directly avail- 
able from institutional records only for the universi- 



ties in Texas and Massachusetts; therefore, for the two 
California universities, responses to the Student De- 
scriptive Questionnaire were used to classify students by 
major. 

Procedure for Categorization of 
College Majors 

In order to identify empirically which college majors 
had average, substantially easier, or substantially 
harsher grading standards at each of the four institu- 
tions, the first step involved multiple regression analyses 
for predicting freshman college grade-point average 
(FGPA) from SAT scores and high school grades. 
Analyses were run separately at each university for 
males and females who were non-Latino white. The 
non-Latino white groups were chosen to classify fields 
of study because they were the largest groups at each 
university and they had the greatest variety of majors. 
The analyses separated groups by gender and race/eth- 
nicity in order to distinguish the effects of college major 
on FGPA from demographic-group effects. In a regres- 
sion combining both sexes, it would be difficult to 
interpret residuals for majors where there was a dispro- 
portionate representation of males or females. For ex- 
ample, if physics majors have lower grades, it could be 
argued that it is not grading standards per se that are 
tougher for physics majors. Instead, the effect could be 
due to the disproportionate presence of males with 
lower FGPAs in comparison with other majors that 
have more females with higher FGPAs. 

The second step was to calculate the residual dif- 
ferences between students' predicted FGPA and their 
actual FGPA, which were then divided by their stan- 
dard errors (separate analyses by sex). Then, mean 
values of the standardized residuals were calculated for 
groups of students with the same college major, ig- 
noring gender. The assumption here was that the av- 
erage residual for each major at a given institution is a 
function of the leniency of grading standards for that 
major at that institution. This assumption is tenable 
only if there are a sufficient number of individuals 
within a category so that other personal idiosyncracies 
in characteristics that influence freshman grades (e.g., 
study habits) or statistical "errors" will cancel each 
other out. 

How 'ever, there were so many individual categories 
of majors in the SDQ and the institutional data that 
some categories included only one student. In order to 
derive a more stable estimate of grading leniency for 
fields of study, it was necessary to group related cate- 



gories of majors with similar residual values. Fields of 
study were then organized into larger categories similar 
to the groupings used by the National Research Council 
(1987, p. 82) in the Annual Survey of Earned Doctor- 
ates. The mean standardized residuals were calculated 
for these broader categories and compared with the 
means obtained from the finer categorization of majors. 
The classification of fields of study was refined as an it- 
erative process until each final grouping had at least 6 
cases (but typically more than 20) and the residual 
values for students and subcategories within the group- 
ing were consistent with each other. For example, the 
broad health sciences grouping was eventually subdi- 
vided into three clusters for the two California institu- 
tions. There were two large clusters, premedicine and 
unspecified health sciences, which were kept separate 
because they hac* quite different residuals. All other 
health categories had very few cases. Categories such as 
prevctcrinary and predentistry had residuals consistent 
with premedicine and were assigned to the same cate- 
gory as premedicine. Others with higher residuals than 
either of the two main clusters were placed into a third 
health cluster. 

The third step was to create a variable (MAJSCAL) 
that reflected the degree of grading toughness of the stu- 
dent's category of college major at his or her institution 
as measured by the size of the mean residual for that 
major. If the mean residual fell in the interval -0499 to 
+.0499, it was assigned a value of zero. A mean residual 
between 0.0500 and 0.1499 was assigned a +1. Fields 
with mean residuals between -0.1500 and -0.2499 
were assigned a -2, and so forth. 

The number of categories of ms.ors and the range 
of MAJSCAL varied by institution. When the sample 
sizes were large, it was possible to include more cate- 
gories of majors. For the university in Texas, there were 
49 categories of majors, having frequencies from 11 
(biochemistry) to 1,160 (prcbusiness) in the non-Latino 
white group. MAJSCAL at this institution ranged from 
-5 (undetermined, pharmacy, and computer science ma- 
jors) to +10 (accounting, advertising, marketing, and fi- 
nance majors). For the university in Massachusetts, 
there were 32 categories of majors with frequencies 
ranging from 18 (sociology and criminology) ;o 916 
(unspecified liberal arts) in the non-Latino white group. 
MAJSCAL at this institution ranged from -7 (engi- 
neering majors) to +5 (acting and voice performance 
majors). For the publi 'inivcrsity in California, there 
were 20 categories of mi jrs, ranging in frequency from 
13 (history, philosophy, and religion) to 207 (engi- 
neering) in the non-Latino white group. MAJSCAL at 
this institution ranged from -5 (engineering) to +5 (Eng- 



lish and education). For the private institution in Cali- 
fornia, there were 20 categories of majors, ranging in 
frequency from 6 (nursing and similar health sciences) 
to 172 (engineering). MAJSCAL at this university 
ranged from -3 (physical sciences other than engi- 
neering) to +6 (foreign languages, history, culture, and 
religion). 

Transformation of Units for 
Independent and Dependent 
Variables 

In order to preserve significant digits for the raw re- 
gression weights in the computer printout, FGPA and 
HSGPA were multiplied by 10 and SAT scores were di- 
vided by 10. Of course, the mean and standard devia- 
tion of the transformed FGPA and HSGPA were 10 
times larger than the usual values, whereas the mean 
and standard deviation of the transformed SAT scores 
were 10 times smaller than the original scores. Correla- 
tions and R -square values were unaffected by these 
transformations, but the root mean square error was in 
the same units as the transformed FGPA, that is, 10 
times larger than usual. As intended, the raw regression 
weights of the transformed SAT scores were 100 times 
larger when compared with analyses in other studies 
using untransformcd scores. Thus, no significant digits 
in the regression weights were lost by rounding in the 
computer printouts, which increased accuracy in the 
calculation of predicted grades using the male groups' 
equations. 

Regression Analyses 

For each gender-by-racial/cthnic group that had at least 
40 cases within a university, freshman grades were pre- 
dicted from high school grades and SAT scores (the 
standard model). Furthermore, another model was run, 
adding MAJSCAL (the new variable or major scale) to 
the standard model. The results of the two models, 
with and without MAJSCAL, were compared for each 
group. 

To test for gender differences within groups, regres- 
sion models were run in each racial/ethnic group, 
pooling males and females. Dummy variables identi- 
fying gender were used to test the lines for parallelism 
(i.e., no interactions or slope coefficient differences) and 
coincidence (i.e., equal slopes and equal intercepts). All 
24 regression models arc shown in Table 1. As can be 



TABLE 1 



Predictors in Regression Models Used to Test for Gender Differences 



Versions of Model 



Model/Control for 


m 


m 


m 


Grading Leniency 


No Gender Terms 


Intercept Differences 


Intercept and Slope Differences 


HSGI'A only 


No control 


HSGPA 


HSGPA + Grade- 


HSGI'A + Gender » G x HSGPA 



With control HSGPA + MAJSCAl. HSGPA + MAJSCAL + Gender HSGPA + MAJSCAL » Gender + G x HSGI'A 



SAT-V only 

No control SAT-V SAT-V + Gender SAT-V + Gender + G x SAT-V 

With control SAT-V + MAJSCAL SAT-V + MAJSCAl. + Gender SAT-V + MAJSCAl. + Gender + G x SAT-V 



SAT-M only 

No control SAT-M SAT-M + Gender SAT-M + Gender + G x SAT-M 

With control SAT-M + MAJSCAL SAT-M + MAJSCAL + Gender SAT-M + MAJSCAL + Gender + G x SAT-M 



Standard 



No control 


HSGI'A + SAT-V + SAT-M 


HSGPA + SAT-V + SAT-M + Gender 


HSGPA + SAT-V + SAT-M + Gender + 
G x HSGPA + G x SAT-V + G x SAT-M 


With control 


HSGPA + SAT-V + SAT-M 
+ MAJSCAl. 


HSGPA + SAT-V 

+ SAT-M + MAJSCAl. + Gender 


HSGPA +SAT-V + SAT-M + MAJSCAL + 
Gender + G x HSGI'A + G x SAT-V + Ci x SAT-M 


Utile: The interaction terms abbreviate sender as "G" (e.g., Ci x HSGI'A is the interaction between Render ? 
males and females were coincident, the R -squares for version .5 were compared w ith those of version 1 in the 
parallelism, versions 3 and 2 were compared in the pooled sample. 


nd HSGPA). To test whether the regression lines of 
sample that pooled males and females. To test for 



seen, there were eight sets of models, each with three 
variations. Within a set, the nondummy variables were 
all the same but the first version had no dummy vari- 
ables, the second had just the dummy intercept term, 
and the third had both intercept and slope difference 
terms. 

Three models had only one preadmission measure 
entered at one time (HSGPA, SAT-V, or SAT-M). The 
fourth was the standard mode! that included all three 
preadmission measures. A second group of four models 
included all of the same predictors as the first group of 
four sets, but, in addition, these models included the 
variable MAJSCAL, which controlled for grading le- 
niency by major. Regression differences were evaluated 
primarily on the basis of effect sizes; that is, group-dif- 
ference terms with uniqueness contributions of .01 or 
larger to the accountable variance (Cohen 1988) were 
considered nontrivial. 

Finally, the eight regression models (without 
dummy terms) were analyzed with just the males of 
each racial/ethnic group, and the estimated parameter 
values for the male groups were used to predict FGPA 
for the corresponding female gi up. Average differences 
between predicted and actual values were calculated for 



the standard, HSGPA-only, SAT-V-on!y, and SAT-M- 
only models, with and without MAJSCAL. 



Results 

Means, Standard Deviations, and 
Distribution of Majors 

Group sizes, means, and standard deviations are shown 
for the four institutions by race/ethnicity and gender in 
Tables 2a, 2b, 2c, and 2d. All groups are shown here, 
but the small ones, those with fewer than 40 males and 
40 females, were not analyzed in the later regressions. 
Consistent with previous findings, females tended to 
score lower than males on the SAT-M, although their 
freshman grades tended to be slightly higher. The Asian 
American group was the only one for whom the mean 
FGPA was slightly higher for maler ;:; three of the four 
universities. The variable that reflects grading leniency 
by major (MAJSCAL) was slightly higher (more posi- 
tive) for females in the majority of groups. 
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Means and Standard Deviations by Raee/Fthnicity and 
Gender: 1 exas Institution 



Raca/libimity 
Variables 


Mean 
Males henuites 


Miles 


SI J 

lemales 


VI.S-l .Ml Vi Will II 


S = 1,1 -14 


1,004 






K,PA 


2s 


16.S" 


K.KO 


".S" 


HSCPA 


^4.-1 1 


55.61 


4 "1 


5 AH 


SAIV 


SI." 5 


5H.2H 


S.O" 


•Ml 


SAIM 




5 5.60 


X.S5 


>j. 3d 


MA |SC A I 


-U.il 


0.1 1 


isn 


1.4.5 




YH \\ Wll Kli V\ 


\ 1(1 


106 






HAW 


20. "4 


1H."5 


~.S5 


S.50 


Use,!' A 


56.14 


?6.5** 


5.6 1 


5.S4 


s.\ r-v 


4S.51 


44. 1 1 


1 l."h 


1 1.6-1 


SA IN! 


«0.<U 


se.4" 


0.40 


"ill 


MAJSl A! 


-0.S0 


-0.41 


1.5 I 


1.S4 




\l KU \\ Wll I'.K \\ 


\ = 4.! 


ri 






l(,I»A 


11.11 


11.1 * 


S.66 


6.1," 


iist,p\ 


51.5" 


54.1 1 


V64 


4.44 


SAIV 


45. S4 


44.10 


OUjl 


S.'l" 


S VI M 


•4 ''.6 5 


46. Si 


10 S4 


4.52 


MAN Al 


-II. 50 


-0.16 


1.4S 


1 01 


1 Vll\n \\ll Uli \\ 


\ = l«ll 


IS" 






KII'A 


14.6" 


15.25 


S. 14 


"."6 


ItSOi'A 


(4 oil 


5114 


4. 55 


4.10 


SA 1 -V 


4". 40 


44 1 1 


4.S5 


10.15 


SAIM 


54.6] 


4S.44 


0 OS 


0.1,5 


MA]M Al 


-tT.n'l 


-0. !* 


1.16 


1.5S 




■ Mill U 


\ = 14 


S 






ICl'A 


2 ".hi 


1^ is 


S.14 


".0! 


1 ISC. PA 


54.1" 


'.".OS 


111 


1.1" 


SA 1 -V 


5 1 . 56 


44.25 


S.50 




SAIM 


5"4 5 


51. SS 


K.4I 


S.S4 


MA|S( Al 




11. IS 


1.14 


\M\ 


X"tt : 1 rcvhiii.iii I'.r.ul 
sulc (i ii. -111. SA 1 si 
l filler ur,,wp vv.ls in .1 


■s .1 (,1'A .nul hiftli 
irt's wm i.n .i s^.ili- 
iik UulrJ m reiui'ssi 


svlu'nl fir.ii.ifs 
10 ti, SO. Sim 
.11 .IM.llws 


!)1SI.I'A 


' vscrv uii .i 

1 lit" Mil. Ill 



These mean differences are consistent with patterns 
of choice of major field. When quantitative majors were 
grouped into one category (including mathematics, 
engineering, computer science, physical sciences, earth 
sciences, and ocean sciences), the percentage of males 
majoring in this category was 14 to .-> 1 percentage 
points higher than the percentage of females of the same 
race/ethnicity majoring in that category. There were 
two exceptions, however. At the private California in- 



Means and Standard Deviations by Race/Ethnicity and 
Gender: Massachusetts Institution 
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stitution, the percentages of males versus females ma- 
joring in the sciences were nearly equal in the Latino 
American group (42 percent versus 40 percent, respec- 
tively) and in the Asian American group (29 percent 
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Means and Standard Deviations by Raee/Iithnicity and 
(lender: (California Public Institution 
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versus 25 percent, respectively). In contrast, the largest 
gender differences in this category of majors were found 
among Asian Americans (47 percent versus 16 percent) 
and Latino Americans (17 percent versus 9 percent) at 
the university in Texas, and the non-Latino white stu- 
dents at the public (California institution (41 percent 
versus 16 percent). Hence, contrary to the author's ex- 
pectation, the pattern of major choices in the Asian 
American group did not show greater gender balance 
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among quantitative majors compared with other ethnic 
grotips. 

Regression Analyses with and 
without MAJSCAL 

Results for two sets of regressions of h'GPA on HS(iPA 
and SAT scores using the standard model are shown in 
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Standard Model Prediction of Freshman Grades by Gender and Race/Ethnicity: Texas Institution 
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Standard Model Prediction of Freshman Grades by Gender 


and Race/Ethnicity: Massachusetts Institution 
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Standard Model Prediction of Freshman Grades by Gender and Race/Kthnicity: California Institutions 
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Tables 3\, 3n, and 3c for gender by racial/ethnic groups 
having at least 40 cases. One set of analyses included 
only HSCPA and SAT scores as predictors, and the 
second set of regressions included these same three pre- 
dictors plus the additional variable MAJSCAL that con- 
trols for grading leniency by college major. The root 
mean square errors and R -squares are shown for both 
models; further, the contribution to R-square by the ad- 
dition of the MAJSCAL variable is reported. Note that 
Table 3C contains results for both institutions in Cali- 
fornia. 

As these tables show, for the majority of groups 
in all institutions, there was a substantially greater 
R-square for the model that contains MAJSCAL. Such 
an improvement would have to occur by necessity in the 
non-Latino white groups because residual values from 
the regressions in these groups were the basis for de- 



riving the MAJSCAL variable. However, the MAJSCAL 
derivation did not depend at all on residual values for 
the Asian American, African American, and Latino 
American groups; therefore, its application in these 
groups can be considered a cross validation of its use- 
fulness. The results show that the increases in R-square 
were fairly large in the majority of groups other than 
the non-Latino white groups. Thus, the method ap- 
pears to have cross validated about as well as can be ex- 
pected. 

Male and Female Differences in 
Regressions 

Differences in the regressions for the two gender groups 
are summarized for non-Latino white, Asian American, 
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Gender Differences after Controlling for Majors: Contributions to R-Square 
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African American, and Latino American students in Ta- 
bles 4 and 5a, 5h, and St. Table 4 reports statistical 
tests to evaluate intercept and slope coefficient differ- 
ences alter the control for major was introduced. Tables 
5a to 5c compare actual and predicted values for mean 
I CPA for females, by race/ethnicity. Note that there are 
two racial/ethnic groups in Table 5lf. The original scale 
for FCJPA with a maximum value of 4.00 was used in 
these tables. The males' equations were used to derive 
predicted FCJPA for females. The first four columns 
show the results for models not including a control for 
leniency in grading standards by major. The model re- 
ferred to as the "Standard 3" used HSGPA and SAT 
scores as the three predictors, whereas each of the other 
three regression models shown used only one of these 
variables as a predictor. The second set of four columns 
shows the results for parallel analyses, this time based 
on regressions that included the same corresponding 
variables plus MAJSCAI.. (Of course, values for males 
are not shown since the use of the males' equations 



guarantees perfect agreement between the actual and 
predicted values at the mean.) 

Standard Model 

The results in Tables 4 and 5.\, 5n, and 5i showing dif- 
ferences between males and females in the standard 
model were very consistent across groups and universi- 
ties in terms of general pattern. Although gender effects 
tended to be somewhat smaller when MAJSCAI. was 
added, the pattern of differences for a given group was 
unaltered by the inclusion of MAJSCAI.. To avoid re- 
dundancy, only the statistical tests with the control for 
grading leniency are shown in Table 4. For example, be- 
fore the inclusion of MAJSCAI., there were only two in- 
tercept-difference terms that contributed more than .01 
to R -square: the non-Latino white group at the univer- 
sity in Texas and the African American group at the 
private California institution. These two terms were re- 
duced, respectively, from .0124 to .00K2 and from 
.O.i 15 to .0272; nevertheless, they remained statistically 
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significant and the amount of utKlerp.vtlictinn of actual 
grades was still modestly large for each (I). 1'4(S and 
0. 128, respectively). The other groups showed no large 
or statistically significant intercept differences. Some 
slope coefficient dissimilarities were found hut there 
were no consistent patterns across groups and universi- 
ties in terms of the variable involved or the direction of 
the gender difference. 

Overall, the actual freshman grades of females 
were higher than the values predicted using the male 
students' regression equation. 1 *, both before and after 
including the control variable tor grading leniency 
(MAJSCAL) in the regressions, '["here were only 
three exceptions out ot 14 contrasts. Predicted values 
were higher than actual FCiPA for females in the 
I atino American group at the Massachusetts univer- 
sity and in the Asian American groups at both the 
Texas and Massachusetts universities. These differ- 
ences were less than 0.0H grade-point units in absolute 
value. 

HSGPA-Only Model 

Findings for this model were similar to the standard 
model results for African American and Latino Amer- 
ican students in that FCiPA tended to be higher than 
predicted when the males' equations were used, both 
before and after including MAJSCAI. . the effects were 



trivially small and nonsignificant except tor the African 
American group at the California private institution, 
which also had a large slope coefficient difference 
(HSCPA more correlated with FCiPA for females). With 
tew exceptions, the divergence between actual and pre- 
dicted grades was smaller with the 1 ISCPA-only model 
than with the standard model. 

The pattern of differences was reversed for Asian 
American students with this model — the males' equa- 
tions tended to overpredict the grades of females, and 
the differences tended to become larger in a negative di- 
rection when MAJSCAL. was added. Differences ranged 
from -O.l' i2 to -0.164 grade-point units when 
MAJSCAL was included in the regression model. These 
figures tended to be slightly larger in absolute value 
than for the standard model. Among non-Latino white 
students, there was underprediction of females' grades 
before including MAJSCAL, but the direction of the dif- 
ference reversed when MAJSCAL was added to the 
equations. The degree of overprediction ranged from 
-0.017 to -0.040 grade-point units when MAJSCAL 
was included. These differences tended to be smaller in 
absolute value than the standard model differences for 
the non-Latino white group. Regardless of the direction 
of the effect, gender dissimilarities in intercepts tor 
Asian Americans and non-Latino whites were not sta- 
tistically significant except at the Massachusetts univer- 
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sity and these effects were small. There was only one in- 
tercept difference (at the Massachusetts institution! that 
contributed more than .01 to R-square for Asian 
Americans and the only significant effect for the non- 
Latino white group had a trivially small contribution to 
R-square 1.0012). At two institutions, the Massa- 
chusetts and private California universities, FISGPA 
was more correlated with FGPA for Asian American 
males and there were fairly large slope coefficient dif- 
ferences. 

SAT-V-Only Model 

Overall, the results with the SAT-Y-only model for the 
first three groups were quite similar to the findings with 
the standard model in that females' grades were under- 
predicted by the males' equations. The only large inter- 
cept difference occurred lor the African American group 
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at the private California institution. One slope co- 
efficient difference, for the Latino American group at 
the Massachusetts university, had a contribution to 
R -square larger than .01 (higher correlation between 
SA F-Y and FGPA for females). The amount of under- 
predietion of females' grades was about the same or 
slightly larger than with the standard model (ranging 
from -0.016 to 0.1 ?5) when grading leniency was con- 
trolled. 

The grades of Asian American females tended to be 
overpredicted by the males' equations when MAJSCAI. 
was included. These differences ranged from -0.125 to 
0.0.? 1 without MAJSCAI. and from -0.136 to 0.021 
after including MAJSC.AL. None of the intercept differ- 
ences contributed more than .01 to K-square; there was 
one fairly large slope coefficient difference at the Texas 
university, in that SAT-Y was more highly correlated 
with FGPA for females. 

17 



Mean Predicted FtJPA for Asian American Female Students Using Male Students' Regressions ^ 

Predictors in Regression Models 



t tiivirsitv 



1 1 \ \* 



I'raiicud K.l'A 

Auu.tt nniuH pu-,l..u-.i 



M NORM \: I'l 111 l< 



Pn-Jwu-d K.l'A 



Actu.vl muni- prcjiklvd 



\i in 'KMX: ri<l\ Ml 



I'mJicit-.l K,r\ 



Without MAJSCAI. 



hach tk-low Plus MAJSCAI. 



Standard i 

Combined USGfA SAT-Y 



SAT- SI 



Standard .? 

Combined HSCPA 



Aitu.il Mi-.ui K.l'A r 2.S 5 



5.0(H) 



2.S24 



-O.02 1 



-1). 1 



-0.12 . 



(l.l)s 1 



Aau.il Nk-.U1 K.l'A = l.~04 



.(-50 



2.hSS 



Avivial minus pjvdii u J 



0.0~4 



o.o~i 



ll.Hl!> 



ii.ro 



Auu.ll M.-.l.i K.l'A = 2." '4 



5.00s 



:.s4(> 



ll.lls 



-ii in 



Aviii.il Mimii K.l'A = 5.545 



5.524 



vtu.l! 11111111*. pri'ilivti-il 



11.01'' 



-II. 01 1 



0.(1 51 



i.llMs 



-0.02" 



-0.024 



..SS'O 



0.054 



0.(104 



5.02" 



-0.1 s2 



2.SOX 



-0.1 (-.4 



-li.lIsS 



-H.052 



5.01 1 



-11 1 Sri 



0.021 



SAT-V SAT-SI 



2 S-14 



i.d 1 I 



0.0'' 5 



2.S41 



5.2''S 
0.04 5 



\..,, V I.., K.I'V .Mul.a.i.m.jiivl J.K.n.K^.uvunlUM.nuHulu.,,., -».il> " m 1 . Mv.m fn Juli-.t I ( .1' \ .mil .tuiul ttn-.ni U.I'tm no, m.,1,. 



SAT-M-Only Model 

This model showed consistent and fairly large undcr- 
prcdiction, on average, of females* grades for all groups, 
including Asian Americans, at all universities. The equa- 
tions not including MAJSCAL underpredicted females' 
grades bv 0.051 to 0.i05 grade-point units, and those 
including MAJSCAL underpredicted females' grades by 
0.045 to 0.256 grade-point units. Intercept differences 
in the models not including MAJSCAI had contribu- 
tions to R-square larger than .01 for 9 of 14 contrasts. 
These differences were smaller in the models includ- 
ing MAISCAI.. but three contrasts had contributions to 
R-square larger than .01 — the non-Latino white group 
at the Texas university, the African American group at 
the private California university, and the Latino Amer- 
ican group at the Texas university. Some slope coeffi- 
cient differences were found, but they were not consis- 
tent in direction. 

Comparison of R-Squares across Models 

Naturally, the model that had the most accurate predic- 
tion (i.e., the highest R -squares and lowest root mean 
square errors) in every group at all institutions was the 
standard model. For analyses where males and females 
were pooled together, R-squares for this model ranged 
from .0S(i4 to .4~0i for different racial/ethnic groups 
and universities when MAJSCAL was included in the 



analvsis. In contrast, R-squares for the HSCPA-only, 
SAT-V-onlv, and SAT-M-only models ranged from 
,0558 10 ,340f, .O.n to .2423, and .0318 to .3066, re- 
spectively. The addition of SAT-V and SAT-M jointly to 
HSCPA in the standard model produced an increment 
in R-square that ranged from .02 m . P for different 
groups and universities. 

Discussion 

This study addressed three basic issues: How effectively 
can the use of information on college majors control for 
differential grading practices across fields of study? 
What are the relative contributions of each predictor 
variable (HSCPA, SAT-V, SAT-M) separately and in 
combination to gender differences in the prediction of 
college grades? Can differential grading practices 
across fields of study account for variations in gender 
differences across racial/ethnic groups and across uni- 
versities? 

Effectiveness of College Majors as 
a Control for Grading Leniency 

Although many prior studies ha\e established that le- 
niency in grading varies by subject area, the question re- 
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mains of how to control for these effects in a practical 
yet effective way. Because variations in the leniency of 
grading practices do not reflect real differences in 
achievement, these scale differences arc an important 
nuisance factor that needs to be controlled. The most 
precise, exact methods to control it have involved use of 
individual course grades from transcripts, an approach 
that is not feasible in most studies. 

The present study investigated to what extent one 
could control for differential grading across courses by 
using information on college majors instead of indi- 
vidual course grades. Information on college majors is 
generally more readily obtainable than full transcript 
data. Furthermore, analyses involving college majors 
are relatively straightforward and avoid the complexi- 
ties of having unequal, and often very small, groups of 
students in individual courses. The categorization of 
majors was different from that used in prior studies in 
that it was carried out in a fashion tailored specifically 
to each institution. 

It was found that introducing the variable 
MAJSCAL, which controlled for differential grades 
across majors, did increase predictive accuracy for 
nearly all groups and did reduce intercept differences 
and the amount of underprediction of females' grades. 
The present study confirmed prior findings with regard 
to differential grading practices. Hence, there is ample 
evidence from this study and others that college grades 
differ in scale across fields of study and that variations 
in grading leniency contribute to variations in grade- 
point average. Since males and females are unequally 
distributed among fields that differ in grading leniency, 
such variations need to be controlled if we are to ex- 
amine gender differences in the prediction of college 
grades. 

Despite substantial improvement in the accuracy of 
prediction, this method worked no better than the ear- 
lier categorization of majors in terms of reducing gender 
differences. The intercept-difference term here for the 
Texas university using the standard model was about 
the same size and still as statistically significant as in the 
earlier analysis (Pennock-Roman 1990). 

The near equality of results for the dummy-variable 
approach (Pennock-Roman 1990) and the use of 
MAJSCAL in this study to control for variations in 
grading leniency by major suggests that the most im- 
portant distinctions can be preserved by using three 
broad categories: quantitative sciences, biological sci- 
ences, and nonquantitative nonscience fields. For ex- 
ample, in analyses of combined Latino American and 
non-Latino white groups, Pennock-Roman (1990, 
Iable 3.15) found that humanities, social sciences, busi- 
ness, and education majors were more leniently graded 



than physical sciences and engineering majors at all six 
institutions studied, but the biological/health sciences 
showed less consistent results. The findings from the 
present investigation and Pennock-Roman (1990) are 
consistent with many studies .hat have shown large con- 
trasts in grading leniency between quantitative and non- 
quantitative majors (Elliott and Strenta 1988; Goldman 
and Hewitt 1975; Goldman et al. 1974). 

Another issue regarding the categorization of ma- 
jors concerned the use of students' responses to the 
SDQ at the time that they were taking the SAT versus 
the institutional records of students' majors. Judging by 
the size of the contribution to R-square of the variable 
MAJSCAL, the procedure worked about as well for the 
two California universities using SDQ data as it did 
using information provided by the Texas and Massa- 
chusetts universities. However, it cannot be said from 
this study whether the two sources of information are 
interchangeable or equally valid. In future studies on the 
efficacy of using controls for college major, these results 
should be confirmed by comparing controls based on 
information from the SDQ with controls based on ac- 
tual institutional records on declared majors at the same 
university. 

Comparisons Across Regression 
Models 

Another goal of this study was to separate as much as 
possible the degree of differential prediction by gender 
attributable to the individual predictors: high school 
grades, SAT-V, and SAT-M. Although the standard, 
recommended practice for admission committees is to 
use these variables in combination rather than sepa- 
rately, these analyses can help pinpoint the source(s) of 
underprediction for female students. For example, it is 
important to know whether HSGPA underpredicts or 
overpredicts college grades. Because HSGPA is often 
higher for females than for males with the same test 
scores, one could interpret the findings in several ways. 
One could speculate that tests are biased against fe- 
males, that females are more diligent students, that 
teachers' grades may be biased against males, or that fe- 
males avoid high school science and mathematics 
courses with tough grading standards. There is evidence 
that a slightly higher percentage of males than females 
take high school mathematics and science courses (see- 
the High School and Beyond survey, Ekstrom et al. 
1988, and the National Assessment of Educational 
Progress, Mullis and Jenkins 1988). If patterns of high 
school course-taking or teachers' biases raise high 
school grades for females, then their grades would be 
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inflated by factors unrelated to later performance in col- 
lege. Thus, if this were true, we would expect to find 
larger intercept differences for the HSGPA-only model 
as compared with the regressions involving test scores, 
particularly when controlling for differential grading 
standards in college courses by college major. The di- 
rection of the expected difference, if high school grades 
were inflated for females, would be that females' actual 
college grades would be lower than those predicted 
from the males' equations. 

The results do not support the hypothesis of in- 
flated high school grades for females for the majority of 
groups. For the African American and Latino American 
groups, the differences between predicted and actual 
mean FGPA for females after controlling for major were 
nearly always in the positive direction (underprcdiction 
of college grades). There was only one negative value — 
the Latino American group at the university in Massa- 
chusetts (-0.042) — and the intercept and slope coeffi- 
cients jointly contributed less than .003 in that case. 
Among the non-Latino white students, the differences 
were consistently negative but closer to zero than the 
differences in all other models; all absolute values were 
less than 0.040 and the intercept and slope coefficient 
contrasts had no joint contributions to R-square larger 
than .003 in this group. The pattern of results for the 
Asian American group was somewhat different, as dis- 
cussed later. 

On the other hand, there was clear evidence that the 
largest undcrprediction, on average, of female students' 
grades resulted with the SAT-M-only model and this 
undcrprediction persisted after controlling for grading 
leniency by major. This underestimation of females' 
grades using the males' equations occurred even when 
there was a high proportion of females in quantitative 
majors that ncariy matched the proportion of males in 
those fields. At the private California institution, the 
percentages of males versus females majoring in the sci- 
ences were ncariy equal in the Latino American group 
(42 percent versus 40 percent, respectively) and the 
Asian American group (29 percent versus 25 percent, 
respectively), yet the differences between actual and pre- 
dicted grades for females after controlling for 
MAJSCAL were 0.054 for the Latino American group 
and 0.045 for the Asian American group. In contrast, 
differences between the actual and predicted grades for 
females in these groups using the HSGPA-only model 
were closer to zero (0.014 and -0.032, respectively). 
These results arc consistent with those of Bridgcman 
and Wendler (1991), who found that5AT-M underprc- 
dictcd female students' grades in individual mathe- 
matics courses. Thus, SAT-M undcrprcdictcd the acad- 
emic achievement of female students in mathematics 



and in the broad spectrum of courses taken by science 
and nonscience majors. 

For the standard model and the SAT-V-only 
models, gender differences were typically small after 
controlling for leniency in grading standards, although 
sometimes still statistically significant. The small 
residual underprcdiction of females' grades wa, consis- 
tent with many studies (Linn 1982; Ramist ct al. 1994; 
Strieker ct al. 1991). Other variables not included here, 
such as study habits and essay writing skills, may ac- 
count for these differences. 

Aside from the issue of under- and ovcrprcdiction, 
the models were aho evaluated in terms of the degree of 
relationship between actual and predicted values (R- 
squarc). The standard model showed the best overall 
predictive ability in terms of smaller root mean square 
errors and multiple R-squares than other models. Thus, 
in agreement with the bulk of research in this area, the 
results of this study support the use of the standard 
model over others because the differences between ac- 
tual and predicted grades were not much larger than 
with the HSGPA-only model, and the R-squarcs were 
more substantial. 

Gender Differences by 
Race/Ethnicity 

Another major goal of this investigation was to extend 
the analyses in the previous report to African American 
and Asian American groups at the same institutions be- 
cause data on gender differences for these groups arc 
less frequently reported. Owing to subgroup differences 
in language background and other variables, the exam- 
ination of subgroup variations may provide clues for 
sources of cross-cultural influences on gender differ- 
ences. For example, a higher incidence of bilingualism 
in a group may reduce the female advantage in essay- 
writing. If subgroups vary in degree and direction of 
gender differences, then ignoring such differences by 
combining all students in a single group may lead to in- 
consistent results across universities that differ in 
racial/ethnic composition. In this study, the combined 
groups were quite diverse and could not be considered 
equivalent; whereas the public institution in California 
had a high percentage of Asian Americans, the univer- 
sity in Texas had a high percentage of Latino Ameri- 
cans. Hence, the analysis by race/ethnicity may be more 
useful for making generalizations across institutions be- 
cause it separates population group differences from 
university-specific characteristics. 

In this study it was found that while the pattern of 
gender differences for the African American group was 
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similar to that of the non-Latino white group, the de- 
gree of underprediction of females' grades was slightly 
larger in the former group (Table 5b). In contrast, the 
Asian American group showed slightly smaller differ- 
ences between females' actual grades and predicted 
grades than those found among the non-Latino white 
group. Gender differences were also smaller for the 
Latino American group than for the non-Latino white 
group, as found in the earlier analyses (Pennock-Roman 
1990). 

Reduced Gender Differences in the Asian 
American Group 

Unlike other groups, actual FGPA for Asian American 
females was found to be lower than predicted FGPA 
using the males' parameter estimates for the HSGPA- 
only equation without MAJSCAL. At three of four 
universities, these differences became slightly larger in 
absolute value when major was controlled, but only one 
of four contrasts had contributions of more than .01 to 
R-square. These results could be consistent with the 
hypothesis of high school grade inflation for females, 
although a variety of other facors considered below 
could also produce similar findings. 

More equal male-female distributions across col- 
lege majors can be ruled out as an explanation for the 
somewhat smaller, and sometimes reversed, gender dif- 
ferences in the Asian American group. Contrary to the 
author's expectations, the distributions of quantitative 
versus nonquantitative majors revealed the same pat- 
tern found in other groups. With the exception of the 
private institution in California, substantially greater 
numbers of males than of females chose quantitative 
majors at the majority of institutions. The percentages 
of males versus females in the Asian American group 
choosing majors in the physical sciences and engi- 
neering were 47 percent versus 16 percent at the Texas 
university, 44 percent versus 19 percent at the university 
in Massachusetts, 47 percent versus 29 percent at the 
public institution in California, and 29 percent versus 
25 percent at the private institution in California. Hence 
the reduced gender differences in the Asian American 
group as compared with the non-Latino white group 
before controlling for grading leniency cannot be attrib- 
uted to the distribution of majors. 

There is also no apparent n iationship between the 
disparities in numbers of quantitative majors between 
the two sexes and the size of the underprediction of fe- 
males' grades among Asian American students before 
controlling for grading leniency. For example, the pro- 
portion of male and female students in quantitative 
majors was nearly equal at the private institution in 
California (29 percent among males and 25 percent 



among females) and quite discrepant at the Texas uni- 
versity (47 percent among males versus 16 percent 
among females). Yet freshman grades (uncorrected for 
grading leniency) of females were underestimated by 
nearly the same amount (0.051 and 0.056 grade-point 
units at the Texas and private California institutions, 
respectively) when the males' equation for the SAT-M- 
only model was used. The differences between predicted 
and actual values for the standard model (leaving out 
MAJSCAL) were essentially zero at both institutions 
(-0.021 and 0.019 grade-point units at the Texas and 
private California universities, respectively). 

Two plausible hypotheses to explain the smaller 
gender differences in the Asian American group come to 
mind. One is that male Asian American students may 
have such high levels of motivation and conscientious- 
ness that they match females in their study habits, un- 
like males in other groups. A second hypothesis is that 
female Asian American students who are bilingual may 
have a lesser advantage in essay writing than non- 
Latino white female students. These hypotheses cannot 
be verified in the present study because there is no in- 
formation available on students' motivation and study 
habits or on essay writing skills. However, they may be 
worthy of future exploration. 

Sources of Gender Differences to 
Explain Variations Across 
Universities 

In an earlier analysis of these data using dummy-coding 
of college majors that were defined in the same way for 
all institutions (Pennock-Roman 1990), a significant in- 
tercept difference was found for the non-Latino whire 
group at the Texas university (standard model). In the 
present study, controls for college major specifically tai- 
lored to each institution were used to see if better con- 
trols would eliminate this effect. Although the addition 
of the variable MAJSCAL did reduce gender differences, 
they remained statistically significant at the Texas insti- 
tution. Moreover, the amount of underprediction of fe- 
males' grades using the males' parameter estimates for 
the standard model including MAJSCAL (0.146 grade- 
point units) remained fairly large compared with that 
found in other studies of this type. 

Any number of factors not controlled for in the 
present investigation could account for the remaining 
differences in prediction for males and females at the 
Texas institution, but there is insufficient information 
available about courses and the social climate at this 
university to evaluate which factors are more involved. 
Perhaps there arc more essay-type examinations in the 
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freshman year at the Texas university that give females 
an advantage. It is possible that there is more flexibility 
in nonmajor courses than at other universities, and that 
the Texas university females take more leniently graded 
courses outside their majors than males do. Maybe the 
university environment provides more encouragement 
for academic pursuits by females than by males, who 
might be distracted by nonacademic pursuits (e.g., foot- 
ball). Although females have been found to be more 
conscientious in their study habits and more willing to 
seek help with their studies than are males in general, it 
is not clear why these factors would be more salient at 
the Texas institution than at the other three universities. 
All four institutions in this study can be expected to 
have fairly demanding curricula because they are selec- 
tive, major research universities. However, the Texas in- 
stitution has the largest student body (the large N for 
the university in Massachusetts reflects the inclusion of 
two freshman classes). This suggests a hypothesis that 
could be tested in future research. Perhaps gender dif- 
ferences in grades are larger at universities with imper- 
sonal, demanding environments because females have 
better coping skills. 



Conclusions 

In the majority of studies of gender differences in the 
prediction of college grades, the analyses focus on the 
joint prediction achieved by the combination of high 
school grades and SAT scores. In the present research, 
like that of Ramist et al. (1994) and a few others, dif- 
ferential prediction was examined for each individual 
predictor and the usual combination with the objective 
of exploring sources of possible gender differences. The 
results showed little evidence that female students' high 
school grades were inflated in non-Latino white, 
African American, and Latino American groups; some 
results that can be interpreted as weak evidence for in- 
flated high school grades were found for the Asian 
American group, however. On the other hand, the 
model using SAT-M as the only predictor consistently 
underestimated, on average, the college grades of fe- 
males in all groups, even after controlling for college 
major. Although there was underprediction on average, 
much variation among individual female students oc- 
curred and the grades of some were actually overpre- 
dictcd. The consistent differences between the actual 
average grades of females and those predicted by the 
males' equations were not always associated with sta- 
tistically significant differences in intercepts and slopes. 
Nevertheless, in several instances gender effects in the 
SAT-M-only model, even after controlling for grading 
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leniency, were appreciably large, accounting for more 
than one percent of the variance. The combination of 
the standard predictors had considerably higher accu- 
racy of prediction than any individual variable consid- 
ered separately, and the amount of underprediction of 
females' grades was usually small and not statistically 
significant. Thus, the standard model was the best 
choice, as found in many previous investigations. 

Although the labor-intensive methods employed by 
Ramist, Lewis, and McCamley (1994), Elliot and 
Strenta (1988), Strieker, Rock, and Burton (1991), and 
Young (1990, 1991), which depend on individual 
course grades, are more precise for adjusting grades for 
leniency and improving the reliability of predicted 
course grades, the procedure proposed here appears to 
be a much easier, more practical method to control for 
variations in grading standards by fields of study be- 
cause it avoids the analysis of transcript data and the 
problems of groups of unequal size across courses. This 
procedure improved the accuracy of prediction of 
freshman grades, but it is not possible to know in this 
study how its effectiveness compares with 'the control 
for grading leniency achieved with the other methods. If 
the same data were analyzed with several methods, a 
comparative analysis of the effectiveness of each could 
be done. A comparative analysis of self-reported majors 
from the SDQ versus institutional data on majors would 
also be useful. 

What is known is that the improvement in predic- 
tion was seen not only in the original groups on which 
the classification of majors was made, but it was also 
cross validated in other groups at the same universities 
that were not involved in any way in the categorization 
of majors. The correction for grading leniency reduced, 
but did not completely eliminate, the underprediction of 
female students' grades by the males' equations among 
non-Latino white students, particularly at the Texas 
university. The reduction in the amount of under- 
prediction when MAJSCAL was added was smaller in 
the other groups. This small residual underprediction 
was consistent with past research controlling for 
grading leniency. 

Some trends in grading leniency associated with 
fields of study were consistent across universities (e.g., 
engineering majors were graded by tougher standards) 
and confirmed findings from past studies. Nevertheless, 
there were considerable variations across universities in 
some fields of study. The categorization of majors 
used here apparently had no greater advantage in re- 
ducing gender differences as compared with the 
dummy-variable approach to the coding of majors into 
four broad categories used in the earlier analyses 
(Pennock-Roman 1990). The evidence from several 
studies (Elliott and Strenta 1988; Goldman and Hewitt 
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1975; Goldman ct al. 1974; Pcnnock-Roman 1990) 
suggests that the most important distinction to make is 
that between quantitative and nonquantitative majors. 
A third category should perhaps be created to distin- 
guish the biological sciences from the quantitative sci- 
ences and other fields because grading leniency effects in 
biological science fields were less consistent. 

Comparing actual grades and those predicted from 
the males' equations, the African American group 
showed the most underprediction of female students' 
grades whereas the Asian American group showed the 
least underprediction. These analyses suggest that com- 
bining all racial/ethnic groups into a single group for the 
study of gender differences may reduce the compara- 
bility of results across universities because the composi- 
tion of the single group may vary greatly across institu- 
tions. Contrary to the author's expectation, male and 
female Asian American students were not more equally 
distributed among quantitative majors as compared 
with other groups. It is proposed that future studies ex- 
plore whether or not Asian American male and female 
students are more equally matched in study habits and 
essay writing skills than males and females in other 
racial/ethnic categories. 
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Appendix 



Additional Notes on the 
Derivation of MAJSCAL 

As explained in the Method section, the variable 
MAJSCAL was created to reflect the grading leniency in 
courses in various categories of majors. It was based on 
the mean standardized residuals from the within-gender 
regressions of FGPA on HSGPA, SAT-V, and SAT-M 
for students in each category. Standardized residuals 
(residuals divided by their respective, within-gender 
standard errors) were used rather than the raw residuals 
for two reasons. 

First, standardized residuals compensated for pos- 
sible differences between males and females in the vari- 
ance of residuals; gender differences in the variance of 
residuals would have affected the interpretation of the 
size of a mean residual. Consistently, past studies have 
found smaller residual variances for females because 
preadmission measures tend to be more highly corre- 
lated with FGPA for females (Linn 1982; Morgan 1990; 
Ramist et al. 1994; Sawyer 1986). If no correction were 
made for possible differences in residual variance by 
gender, the residuals for males would tend to have more 
extreme values than those for females. Thus, the mean 
residuals for major categories dominated by males 
would be larger in absolute value and the rrean resid- 
uals for major categories dominated by females would 
be smaller in absolute value. In a sense, these means 
would not be on the same scale and they would still re- 
flect gender effects (which we are trying to separate as 
much as possible from grading leniency). 

The second reason for standardizing the residuals 
was to give their distance from zero more interpretable 
units. One cannot know what is a relatively large or a 
relatively small raw residual (in absolute value) without 
examining the entire distribution of raw residuals for 
that regression analysis. 

For practical reasons, mean residuals for each 
major were grouped into intervals and then whole 
negative or whole positive values were assigned to 
MAJSCAL for students in that category of major ac- 
cording to the interval for that student's major (see 
Method). The use of whole numbers facilitated the key 
entry of values of MAJSCAL to the data set each time 
the categorization was revised. As explained in the 
Method section, the grouping for infrequent categories 
of majors was an iterative process involving several re- 
visions of MAJSCAL values each time the grouping of 
majors was changed. If MAJSCAL had been defined to 
be exactly equal to the mean residual, many more revi- 



sions would have had to be made at each iteration, 
sometimes for only a few tenths of a point change in the 
means. Admittedly, there would have been an advan- 
tage in having MAJSCAL values set equal to the exact 
value of the mean residuals, in that the control for 
grading leniency would have been slightly more accu- 
rate in the non-Latino white group. However, such an 
exact grading-lcniency rating would still be only an ap- 
proximation of the grading-lenicncy rating in other 
racial/ethnic groups; it is not 'likely that using the exact 
means would have added any greater precision for con- 
trolling grading leniency in any group other than the 
one actually used to derive MAJSCAL. In sum, the extra 
precision obtained for just one group by the use of 
exact mean residuals did not seem worth the extra 
effort involved in adding MAJSCAL to the data set, ci- 
ther by key entry or by programming the assignment of 
MAJSCAL values for each university. 
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